Use CTRL/CMD + Shift + k to preview your markdown. Hit
the Visual button, or use
CTRL/CMD + Shift + F4 to switch to visual mode, which will
let you edit the formatted version in real time.
At the top of the .Rmd file (not shown in rendered
result) is the YAML (Yet Another Markup Language) header. It is a
human-readable data serialization language. It sets some options for
your markdown and gives you a nicely formatted preamble. It is currently
set to some of my preferred settings, but feel free to play around with
this and make it your own.
We have it set here to output as html, but you can just as easily produce PDF or Word documents. There are a bunch of built-in themes that you can explore here. I quite like the readable theme.
Below the YAML header in the .Rmd document you will find
the first code chunk. You will notice that this one does
not appear in the rendered document - this is because it is the
setup chunk, and it has the option
include=FALSE set. We use this to set options for chunk
behavior, as well as loading packages and data and such.
The # above makes a header. A single # is
the largest header, and extra #s are smaller headers.
This header is automatically numbered because of the YAML settings
and the the double #.
This is three #s.
This is four #s. Note that it does not show up in the
table of contents because we only asked it to keep track of the first
three levels.
Use a single asterisk to make font italic.
Use double asterisks to make font bold.
Note that you need a blank line between paragraphs to split up text. Starting on a new line is not enough.
To make bullet points, use -:
To make numbered lists, use n.:
To put code in-line, use back ticks (``)
For multiple lines of verbatim code, use triple back ticks.
x + 1 = y
To make block quotes, use
>at the start of the line. Use these to emphasize a point.
Here we will explore some proper code chunks. You can use
CTRL/CMD + ALT + I to create a new chunk. After the
r comes the chunk name. This is not required, but is
convenient if we hit an error because it will tell us where the problem
was.
If we want to run code, but not show it, we can use
echo=FALSE in the chunk options.
Otherwise, our code chunk will be visible. Let’s show off our example regression from the FSCI paper.
lm <- lm(normvalue ~ year + FSCI_region, data = df)
summary(lm)
##
## Call:
## lm(formula = normvalue ~ year + FSCI_region, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -31.370 -8.881 -3.519 6.425 72.022
##
## Coefficients:
## Estimate Std. Error t value
## (Intercept) 802.36120 99.04397 8.101
## year -0.39300 0.04928 -7.975
## FSCI_regionEastern Asia 11.45318 2.49413 4.592
## FSCI_regionLatin America & Caribbean 0.24044 1.75020 0.137
## FSCI_regionNorthern Africa & Western Asia -3.01052 1.81558 -1.658
## FSCI_regionNorthern America and Europe -7.85822 2.04028 -3.852
## FSCI_regionOceania -0.14376 2.09696 -0.069
## FSCI_regionSouth-eastern Asia 2.58542 1.95749 1.321
## FSCI_regionSouthern Asia 4.86597 2.03578 2.390
## FSCI_regionSub-Saharan Africa 15.54695 1.69547 9.170
## Pr(>|t|)
## (Intercept) 0.000000000000000845 ***
## year 0.000000000000002294 ***
## FSCI_regionEastern Asia 0.000004607788663308 ***
## FSCI_regionLatin America & Caribbean 0.89074
## FSCI_regionNorthern Africa & Western Asia 0.09741 .
## FSCI_regionNorthern America and Europe 0.00012 ***
## FSCI_regionOceania 0.94535
## FSCI_regionSouth-eastern Asia 0.18669
## FSCI_regionSouthern Asia 0.01691 *
## FSCI_regionSub-Saharan Africa < 0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 14.93 on 2484 degrees of freedom
## Multiple R-squared: 0.2395, Adjusted R-squared: 0.2368
## F-statistic: 86.93 on 9 and 2484 DF, p-value: < 0.00000000000000022
This shows our code and output, but it is not very nice to look at.
To get a cleaner regression output, we can convert our regression
output to a data frame, then use knitr::kable() to create a
nice looking table.
lm_df <- broom::tidy(lm)
knitr::kable(lm_df)
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | 802.3611950 | 99.0439650 | 8.1010609 | 0.0000000 |
| year | -0.3930000 | 0.0492762 | -7.9754489 | 0.0000000 |
| FSCI_regionEastern Asia | 11.4531769 | 2.4941272 | 4.5920580 | 0.0000046 |
| FSCI_regionLatin America & Caribbean | 0.2404370 | 1.7502015 | 0.1373768 | 0.8907441 |
| FSCI_regionNorthern Africa & Western Asia | -3.0105244 | 1.8155793 | -1.6581619 | 0.0974110 |
| FSCI_regionNorthern America and Europe | -7.8582226 | 2.0402820 | -3.8515374 | 0.0001203 |
| FSCI_regionOceania | -0.1437598 | 2.0969580 | -0.0685563 | 0.9453483 |
| FSCI_regionSouth-eastern Asia | 2.5854171 | 1.9574857 | 1.3207847 | 0.1866948 |
| FSCI_regionSouthern Asia | 4.8659655 | 2.0357824 | 2.3902188 | 0.0169124 |
| FSCI_regionSub-Saharan Africa | 15.5469469 | 1.6954749 | 9.1696708 | 0.0000000 |
Extra steps to get the column names capitalized and the numbers rounded:
lm_df_cleaner <- lm_df %>%
dplyr::mutate(across(where(is.numeric), ~ round(.x, 3))) %>%
setNames(c(snakecase::to_title_case(names(.))))
knitr::kable(lm_df_cleaner)
| Term | Estimate | Std Error | Statistic | P Value |
|---|---|---|---|---|
| (Intercept) | 802.361 | 99.044 | 8.101 | 0.000 |
| year | -0.393 | 0.049 | -7.975 | 0.000 |
| FSCI_regionEastern Asia | 11.453 | 2.494 | 4.592 | 0.000 |
| FSCI_regionLatin America & Caribbean | 0.240 | 1.750 | 0.137 | 0.891 |
| FSCI_regionNorthern Africa & Western Asia | -3.011 | 1.816 | -1.658 | 0.097 |
| FSCI_regionNorthern America and Europe | -7.858 | 2.040 | -3.852 | 0.000 |
| FSCI_regionOceania | -0.144 | 2.097 | -0.069 | 0.945 |
| FSCI_regionSouth-eastern Asia | 2.585 | 1.957 | 1.321 | 0.187 |
| FSCI_regionSouthern Asia | 4.866 | 2.036 | 2.390 | 0.017 |
| FSCI_regionSub-Saharan Africa | 15.547 | 1.695 | 9.170 | 0.000 |
For a very clean regresion table with less work, try the
sjPlot package:
sjPlot::tab_model(
lm,
p.style = 'stars',
digits = 2,
show.se = TRUE,
robust = TRUE,
show.reflvl = TRUE
)
| normvalue | |||
|---|---|---|---|
| Predictors | Estimates | std. Error | CI |
| (Intercept) | 802.36 *** | 104.19 | 598.06 – 1006.66 |
| Central Asia | Reference | ||
| Eastern Asia | 11.45 *** | 3.27 | 5.05 – 17.86 |
| Latin America & Caribbean | 0.24 | 1.65 | -3.00 – 3.48 |
|
Northern Africa & Western Asia |
-3.01 | 1.66 | -6.26 – 0.24 |
|
Northern America and Europe |
-7.86 *** | 1.70 | -11.18 – -4.53 |
| Oceania | -0.14 | 1.93 | -3.92 – 3.64 |
| South-eastern Asia | 2.59 | 1.73 | -0.80 – 5.98 |
| Southern Asia | 4.87 ** | 1.77 | 1.40 – 8.33 |
| Sub-Saharan Africa | 15.55 *** | 1.66 | 12.28 – 18.81 |
| year | -0.39 *** | 0.05 | -0.49 – -0.29 |
| Observations | 2494 | ||
| R2 / R2 adjusted | 0.240 / 0.237 | ||
|
|||
Note that this function takes the lm model object itself, not a data frame. It is designed to work with regression models and provides a ton of options for displaying them. Check out the documentation here. This is where you really learn how to use a package. It is written by the author, with abundant vignettes and examples.
We’ve already seen how to make tables above. For static tables,
knitr::kable() is a good choice. The
kableExtra package is also a great extension to knitr,
giving you tons of options for customization. See the docs
for examples.
For interactive tables, there are a couple of different options.
The DT package is a classic choices for interacitve
tables. Note that we are setting echo=FALSE here, so the
code chunk will not show up.
This takes almost no code, and creates quite a decent looking table with lots of options.
My personal favorite for interactive tables is
reactable. The documentation is
excellent, so check it out if you’re interested.
reactable::reactable(
data = gapminder,
filterable = TRUE,
searchable = TRUE,
outlined = TRUE,
bordered = TRUE,
compact = TRUE,
striped = TRUE,
showPageSizeOptions = TRUE
)
I find the options for customization here much more intuitive than
DT, and the documentation is much easier to use.
We haven’t really covered plots, but you really just throw your code in the chunk and it will appear.
Base plots with the plot() function are available in the
base R package. It works great for simple plots, but more elaborate and
pretty plots take much more work.
gapminder_2007 <- gapminder[gapminder$year == 2007, ]
plot(
x = gapminder_2007$gdpPercap,
y = gapminder_2007$lifeExp,
col = gapminder_2007$continent,
pch = 16,
cex = sqrt(gapminder_2007$pop) / 10000,
ylab = 'Life Expectancy',
xlab = 'GDP per Capita',
main = 'Life Expectancy against GDP per Capita'
)
legend(
"bottomright", legend = levels(gapminder_2007$continent),
col = 1:5, pch = 16, title = "Continent"
)
The ggplot2 package is one of the biggest strengths of R
in my opinion. It is an excellent package for making pretty plots
easily, with tons of extensions and extra packages for applications in
mapping, chord diagrams, dendrograms, animations, etc. For a nice
gallery of R graphs including example code, check out the R Graph Gallery.
gapminder %>%
filter(year == 2007) %>%
ggplot(aes(x = gdpPercap, y = lifeExp, color = continent, size = pop)) +
geom_point() +
theme_classic() +
labs(
x = 'GDP per Capita',
y = 'Life Expectancy',
title = 'Life Expectancy against GDP per Capita'
)
We can change the alignment, size, and resolution of our plot in the chunk options:
gapminder %>%
filter(year == 2007) %>%
ggplot(aes(x = gdpPercap, y = lifeExp, color = continent, size = pop)) +
geom_point() +
theme_classic() +
labs(
x = 'GDP per Capita',
y = 'Life Expectancy',
title = 'Life Expectancy against GDP per Capita'
)
Caption Goes Here
What about an interactive plot? We can use the very popular
plotly package to do this. It is native to python, but the
plotly R package gives us an easy way to access it. It has
its own syntax, but you can also use the ggplotly()
function to convert a ggplot object to a plotly object.
plot <- gapminder %>%
filter(year == 2007) %>%
ggplot(aes(
x = gdpPercap,
y = lifeExp,
color = continent,
size = pop,
text = paste0(
'Country: ', country, '\n',
'Continent: ', continent, '\n',
'Life Exp: ', lifeExp, '\n',
'Population: ', pop
)
)) +
geom_point() +
theme_classic() +
labs(
x = 'GDP per Capita',
y = 'Life Expectancy',
title = 'Life Expectancy against GDP per Capita'
)
ggplotly(plot, tooltip = 'text')
Note that you can hover over points to see more information from out
text field, and also move, zoom, select, and download a
static image of the plot.
Great quick references:
If you want to dive deeper: